QCon London '23 - A New Era for Database Design with TigerBeetle

Поделиться
HTML-код
  • Опубликовано: 2 окт 2024

Комментарии • 24

  • @themichaelw
    @themichaelw 5 месяцев назад +6

    18:00 that's the same Andres Freund who discovered the XZ backdoor. Neat.

  • @LewisCampbellTech
    @LewisCampbellTech 10 месяцев назад +4

    Every month or so I'll watch this talk while cooking. The first time didn't really understand part I. This time around I got most of it. Crazy how the linux kernel prioritised users yanking USB sticks out over database durability.

    • @jorandirkgreef
      @jorandirkgreef 10 месяцев назад +1

      Thanks Lewis, special to hear this-and I hope that the “durability” of the flavors in your cooking are all the better! ;)

  • @uncleyour3994
    @uncleyour3994 Год назад +2

    Really good stuff

  • @Peter-bg1ku
    @Peter-bg1ku 4 месяца назад

    I never thought Redis AOF were this simple.

  • @dannykopping
    @dannykopping Год назад +7

    Super talk! Very information dense and clear, with a strong narrative.
    Also so great to hear a South African accent in highly technical content on the big stage 😊

  • @tenthlegionstudios1343
    @tenthlegionstudios1343 Год назад +5

    Very good talk. It takes a lot of the previous deep dives I have watched and puts them all together. I am curious about the points made about the advantages to single threaded execution model, especially in the context of using the VOPR / having deterministic behaviors.
    When you look at something like Red Panda with a thread per core architecture, using seastar, and a bunch of advanced linux features - are design choices like this making it harder to test and have some sense of deterministic bug reproduction? This is not a tradeoff I have ever considered before, and for a DB that is most concerned about strict serializability and no data loss - this must have greatly changed the design. I am curious about the potential speed ups at the cost of losing the deterministic nature of tiger beetle - not to mention the cognitive load of a more complex code base.

    • @tigerbeetledb
      @tigerbeetledb  Год назад +2

      Thanks-great to hear that! We are huge fans of Redpanda, and indeed RP and TB share a similar philosophy (direct async I/O, single binary, and of course, single thread per core). In fact, we did an interview on these things with Alex Gallego, CEO of Redpanda, last year: ruclips.net/video/jC_803mW448/видео.html
      With care, it's possible to design a system from the outset for concurrency, that can then run either single threaded or in parallel, or with varying degrees of parallelism determined by the operator at runtime, with the same deterministic result, even across the cluster as a whole. Dominik Tornow has a great post comparing concurrency with parallelism and determinism (the latter two are orthogonal, which is what makes this possible): dominik-tornow.medium.com/a-tale-of-two-spectrums-df6035f4f0e1
      For example, within TigerBeetle's own LSM-Forest storage engine, we are planning to have parts of the compaction process eventually run across threads, but with deterministic effects on the storage data file.
      For now, we're focusing on single core performance, to see how far we can push that before we introduce CPU thread pools (separated by ring buffers) for things like sorting or cryptography. The motivation for this is Frank McSherry's paper, “Scalability but at what Cost?”, which is a great read! www.frankmcsherry.org/graph/scalability/cost/2015/01/15/COST.html

    • @tenthlegionstudios1343
      @tenthlegionstudios1343 Год назад +1

      ​@@tigerbeetledb These articles are gold. Thanks for the in depth reply! Cant wait to see where this all goes.

  • @jonathanmarler5808
    @jonathanmarler5808 Год назад +2

    Great talk. I'm at 15:20 and have to comment. Even if you crash and restart to handle fsync failure that still doesnt address the problem because anothe process could have called fsync and marked the pages as clean, meaning the database process would never see an fsync failure.

    • @jorandirkgreef
      @jorandirkgreef Год назад

      Hey Jonathan, thanks! Agreed, for sure. I left that out to save time, and because it's nuanced (a few kernel patches ameliorate this). Ultimately, Direct I/O is the blanket fix for all of these issues with buffered I/O. Awesome to see you here and glad you enjoyed the talk! Milan '24?! :)

  • @pervognsen_bitwise
    @pervognsen_bitwise Год назад +1

    Thanks for the talk, Joran.
    Genuine question since I don't know and it's very surprising to me: Is there really no way to get buffered write syscall backpressure on Linux? The Windows NT kernel has a notoriously outdated (and slow) IO subsystem which has always provided mandatory disk write backpressure by tracking the number of outstanding dirty pages. So if disk block writes cannot keep up with the rate of dirty pages, the dirty page counter will reach the cap and will start applying backpressure by blocking write syscalls (and also blocking page-faulting writes to file-mapped pages, though the two cases differ in the implementation details).
    I'm assuming Linux's choice to not have backpressure must be based on fundamental differences in design philosophy, closely related to the situation with memory overcommit? Certainly the NT design here hurts bursty write throughput in cases where you want to write an amount that is large enough that it exceeds the dirty page counter limit but not so large that you're worried about building up a long-term disk backlog (a manually invoked batch-mode program like a linker would fall in this category). Or you're worried about accumulating more than a desired amount of queueing-induced latency that would kill the throughput of fsync-dependent applications; considering this point makes me think that you wouldn't want to rely on any fixed dirty page backpressure policy anyway, since you want to control the max queuing-induced latency.

  • @luqmansen
    @luqmansen 7 месяцев назад +1

    Great talk, thanks for the sharing! ❤

  • @rabingaire
    @rabingaire Год назад +2

    Wow what an amazing talk

  • @stevesteve8098
    @stevesteve8098 4 месяца назад

    Yes.... i remember back in the 90's oracle tried this system of "direct IO", Blew lots of trumpets..... and announced it HAS to be better & faster Because ....'insert reasoning here"
    Well you know what..... it was complete bullshit, becasue they made lots of assumptions and very little real testing.
    Because even you THINK you are writing directly to the "disk" YOU ARE NOT....
    you are Writing to a BLACK BOX., you have absolutely NO idea of HOW or WHAT is implemented in that Black box.
    There may be a thousand buffer levels in that box, with all sorts of swings and roundabouts.
    so... no.... you are NOT directly writing to disk, such a basic lack of insight and depth of thought is a worry with this sort of "data" Evangelicalism...

    • @jorandirkgreef
      @jorandirkgreef 3 месяца назад +1

      Thanks Steve, I think we're actually in agreement here. That's why TigerBeetle was designed with an explicit storage fault model, where we expect literally nothing of the "disk" (whether physical or virtualized). For example, we fully expect that I/O may be sent to the wrong sector or corrupted, and we test this to extreme lengths with the storage fault injection that we do.
      Again, we fully expect to be running in virtualized environments or across the network, or on firmware that doesn't fsync etc. and pretty much all of TigerBeetle was designed with this in mind.
      However, at the same time, to be clear, this talk is not so much about the "disk" as hardware-as about the kernel page cache as software, and what the kernel page cache does in response to I/O errors (whether from real disk or virtual disk).
      We're really trying to shine a spotlight on the terrific work coming out of UW-Madison in this regard: www.usenix.org/system/files/atc20-rebello.pdf
      To summarize their findings then, while Direct I/O is (completely) not sufficient, it is still necessary. It's just one of many little things you need to get right, if you have an explicit storage fault model, and if you want to preserve as much durability as you can.
      At least

  • @YuruCampSupermacy
    @YuruCampSupermacy Год назад +2

    absolutely loved the talk.

  • @youtux2
    @youtux2 Год назад +1

    Absolutely amazing.

  • @timibolu
    @timibolu 7 месяцев назад +1

    Amazing. Really amazing

  • @asssheeesh2
    @asssheeesh2 Год назад +4

    That was really great!

  • @dwylhq874
    @dwylhq874 3 месяца назад +1

    This one of the few channels I have *notifications on* for. 🔔
    TigerBeetle is _sick_ !! Your whole team is _awesome_ !! 😍
    So stoked to _finally_ be using this in a real project! 🎉
    Keep up the great work. 🥷

    • @jorandirkgreef
      @jorandirkgreef 3 месяца назад

      Thank you so much! You're sicker still! :) And we're also so stoked to hear that!